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ABSTRACT 


A great deal of study has been conducted in the last ten 
years concerning the use of voice recognition equipment with 
computers. <=t was hoped that its use would reduce the 
required entry time and error rate, and improve the man- 
machine interface between the user and the computer. 

There are many potential applications for such voice 
recognition use in the military, and specifically in the area 
of Command, Control and Communications Gon War games are 
often used today to test the effectiveness of c? technologies, 
and WES is one such war game. 

This paper will assess the feasibility of using voice 
recogniticn equipment to run WES by comparing the results of 
an experiment employing both voice and manual typing input 
modes. The results show that in this particular task typing 


does a somewhat better job than the buffered voice mode, while 


unbuffered voice has very poor results. 
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MeN ORODUCTION 


The cost of computer hardware has dropped dramatically 
in recent years, and the use of computers throughout our 
society has skyrocketed to help us manage the clut of data 
which we are often presented and to solve the increasingly 
more complex problems of the present and future. Histor- 
ically, data has been entered into the computer by keypunch, 
which can be slow, monotonous and error-filled for all but 
the very well-trained. Researchers have looked for a better, 
more efficient man-machine interface than the keypunch, and 
as early as the 1950's they realized that the most natural 
type of communication which we as humans use is speech. So 
why not simply speak to a computer as you would to a fellow 
worker and have the computer perform whatever task you have 


directed? 


meee A BRIEF HISTORY OF VOICE TECHNOLOGY 
Pee ceneral Background 

Voice recognition systems have received quite a bit 
of interest since the 1950's, mainly during the past fifteen 
years. Automatic speech recognition, per se, is concerned 
with automatically determining linguistic messages spoken to 
the voice recognizer by comparing them to acoustic data stored 
in the recognition system. Both industry and the military 


have decided to study the feasibility of incorporating 





interactive voice recognition systems in their computer 
operations in order to have a more natural interface with the 
computer, to increase the speed of data entry and retrieval 
and thereby increase throughput, and to lower the input error 
rate. A voice recognition system, uSing one's natural lan- 
guage, would certainly seem to have the potential for reduc- 
ing errors at the man-machine interface. In addition, the 
higher-level personnel in industry and the military, those 
specifically who must make the important decisions and who 
most need the decision-making aid of the computer, are those 
least likely to sit at the keyboard and use the computer. 

So it was thought that voice interaction would help these 
high-level personnel become more direct users of the systems 
on which they depend. 

Interactive voice recognition systems (1.e., those 
which give either a er or a displayed response to a verbal 
input) can be basically divided into two categories: 
isolated word recognizers and continuous speech recognizers. 
Isolated word recognition systems were the first type 
developed and by far the easier of the two to engineer and 
construct. An isolated word recognizer, with a limited vocab- 
ulary of x number of words or utterances (short phrases), 
must simply recognize the utterance spoken to it and respond 
as programmed. This recognition is accomplished by "training" 
the system prior to its use. Anyone who will be using the 


System trains it by repeating the various vocabulary words a 





number of times, uSually between five and ten, with different 
gnelection, stress, pronunciation, etc., while in a "training 
mode." The parameters of the pronunciation of each utterance 
are then averaged and stored in the digital speech processor 
memory of the system. Then when a word or phrase is spoken 
to the recognition system, its parameters are compared 
digitally to all those stored in memory and hopefully a match 
is found and the proper response made by the computer [1]. 

While this indeed sounds like a complex process, con- 
Sider what the continuous speech recognition system must do. 
In addition to all the above it must be able to recognize 
word sequences and digit strings. It must be able to find 
boundaries between words, or segment the utterance either 
explicitly or implicitly by trying to fit together sequences 
of word pronunciations before the final classification 
process [2,3]. It is difficult to analyze the beginnings and 
endings of words unless adjacent words are known; it is much 
easier to recognize words spore in isolation, or separated 
by short pauses, than those with no pauses between them. 
However, it iS very unnatural for humans to pause after speak- 
ing each word in a sentence,and although the first isolated 
word recognition systems have been in use since 1972, further 
study into advanced systems has continued. 

2. Some Past Uses of Voice Recognition Systems 
Beginning in 1972, there have been several successful 


uses of interactive voice recognition systems in industrial 





settings. These have been strictly isolated word systems up 
to this time. It has been found that by using voice systems 
to interact with a computer, a worker's hands and eyes are 
both free to continue their tasks. It is thereby possible 
to increase the speed of data entry by the worker not having 
to stop what he is doing, write down or directly enter data 
and then return to where he previously left off. Voice also 
cuts down on the number of errors often encountered in this 
process or in other processes where the first worker must 
relay information to a second worker who then enters what he 
heard (perhaps incorrectly) into the data system. 

Airlines were the first to use voice recognition to 
input data to a computer for the correct routing of baggage 
to various aircraft. It was found very efficient to allow 
the baggage handler to input data by voice, freeing his hands 
and eyes to look at and handle the pieces of luggage. Banks 
have been able to accomplish paperless transfers of funds, 
dividends, retirement payments and the payments of bills by 
simply speaking the dollar amount to be transferred to the 
voice recognition system. Quality assurance checks on 
manufactured goods have been greatly simplified and speeded 
up in many cases by allowing inspectors to use their hands 
and eyes for the inspections while simultaneously inputting 
data to a computer by voice. In addition to these few 


examples of discrete speech recognition there are many other 
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areas where voice recognition systems either are currently 


being used, or could easily be used in the future [1,4]. 


feast UDY AND TESTING OF VOICE RECOGNITION SYSTEMS 

Although discrete speech recognition systems have been, 
and are in use, research has continued on both the discrete 
and continuous speech systems. Probably the largest such 
Study undertaken to date has been the Advanced Research 
Projects Agency (ARPA) five year $15 million Speech Under- 
standing Research (SUR) project begun in 1971. This project 
was designed to provide a breakthrough in the handling of 
spoken sentences, by the use of higher-level linguistic 
information and specific task-dictated constraints on what 
could be said [5]. It was thought that this was an important 
project because of increased industrial interest in speech 
recognition, government interest in future programs, the work 
of several foreign countries in the field, and projected 
future widespread applications. In 1978 the projected ten 
year sales of 2.5 million speech processing units (54.8 
billion) seemed to lend a qreat deal of credence to these 
points [5]. 

The SUR project was concerned with understanding as 
opposed to simply word recognition. By understanding was 
meant having the system interpret an utterance and respond 
correctly. The project was designed to be highly task 


oriented, and to have speech analyzed and interpreted in 


ae 





the context of a task, rather than interpreting each word or 
component of the utterance individually [3,6]. Other goals 
included a working vocabulary of 1,000 words for the system 
and accuracy of 90 percent averaged over several different 
speakers. 

The SUR project was designed to develop several inter- 
mediate "throw-away" systems rather than to work toward one 
carefully designed ultimate system. With this in mind there 
were five main system contractors and four specialist contrac- 
tors engaged in the research at the start of the SUR project. 
The five main contractors were Bolt, Beranek and Newman (BBN); 
Carnegie-Mellon University; Lincoln Laboratory; System Develop- 
ment Corporation; and SRI International. The four specialist 
contractors were Haskins Laboratory; Speech Communications 
Research Laboratory; Univac; and the University of California 
at Berkeley [7]. 

Approximately one-half way through the five year project 
three systems which seemed to be farthest along in meeting 
ARPA's goals were selected to continue the project. When SUR 
ended in September 1976 it was generally agreed that it had 
greatly advanced the state of the art in continuous voice 
recognition and that cost-effective speech input was a 
plausible scientific and technical goal [6]. One of the 
final three systems called HARPY, developed by Carnegie- 
Mellon University, met all of ARPA's initial goals. Using 


Emevocabulary of 1,011 words and five different speakers, 


Hy 





HARPY achieved a total sentence accuracy (i.e., all words 
correct) and semantic accuracy (i.e., correct response ac- 
curacy) of over 90 percent for the specific task of document 
retrieval [5,6]. The other two systems tested, HWIM (for 
Hear What I Mean, by BBN) and HEARSAY II (by Carnegie-Mellon) 
fell somewhat short of the stated objectives. 

Another more recent study of voice technology was done 
for the Rome Air Development Center, Rome, New York. In this 
project the use of voice systems to input cartographic data 
for the Defense Mapping Agency Aerospace Center was studied. 
It was found that voice input was fast, more accurate and 
easier to use than the paper, pencil and keypunch that were 
presently in use. In addition, the voice system eliminated 
the need for skilled typists to interact with the computer. 
It was found that the speed of data entry for inexperienced 
personnel was much higher for voice than for those at a key- 
board who were not skilled typists, indicating that much less 
training was required to operate the voice recognition system 
than to skillfully use the keyboard [8]. For this particular 
task, and for others as well, since voice is the most natural 
mode of communication, it was hoped that its performance level 
would be higher than manual input with a minimum of training. 

A final example of a recent voice recognition study [9], 
carried out at the Naval Postgraduate School (NPS), compared 
the uses of manual and voice inputs to run a distributed 


computer network. Using twenty-four military officers as 


as 





subjects operating on the ARPA Network, and using a fixed 
scenario of instructions, it was found that voice input - 
again with minimal voice practice - was 17.5 percent faster 
than manual typing input, and manual input had 183.2 percent 
more entry errors than did the voice input. It is presumed 
that an even greater difference would have been recorded had 


experienced voice input subjects been used. 


See OSS L BLE MILITARY USES OF VOICE RECOGNITION SYSTEMS 

The military is also carefully studying the use of voice 
interactive systems for many varied applications. The author 
has encountered several possible Navy applications which are 
prime candidates for voice recognition use. In the area of 
tactical data systems, normally data has been directly entered 
from remote sensors or by an operator at a keyboard, and then 
either acted upon or retrieved by the operator. Voice systems 
can greatly facilitate the operator's data entry or retrieval 
by allowing him to interact vocally with the system rather 
than requiring a skilled typist at the keyboard. This should 
reduce the time needed for interaction and the possibility of 
many errors [6]. 

A study at NPS addressed the possibility of using a voice 
recognition system as the interface between a ship's Tactical 
Action Officer (TAO) and the Naval Tactical Data System (NTDS) 
computer in order to reduce reaction time. This study also 
postulated the use of a voice synthesizer to output the 


information requested from the computer. The authors felt 
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that there is an incompatibility between a discrete speech 
system and other communications which the TAO uses. During 
a period of tenSion, it might be difficult to use discrete 
Speech with one system and continuous modulation on others. 
It was also felt that a discrete speech recognition system 
would not be compatible with the rapid pace of a TAO's 
duties [10]. Further study should be done in this area. 
Naval aviation is a field where there are a great many 
possibilities for the use of voice systems. One study [11] 
reported investigating the feasibility of using a Voice 
Recognition and Synthesis (VRAS) system with the Advanced 
Integrated Display System (AIDS) on Navy aircraft. VRAS, a 
software package of real-time voice processing routines, when 
used with the AIDS cockpit information system would provide a 
much improved man-machine interface between the pilot and the 
onboard computer. The voice interactive system in this case 
could handle complex tasks encountered in an airborne environ- 
ment and could free the eyes and hands of the pilot for other 
tasks. Some possible uses would include selecting a missile 
verbally vice manually, and having this confirmed verbally, 
thereby allowing the pilot to fly a better intercept profile. 
The system could be used for reporting (e.g., "report air- 
speed"), data entry, systems checks where VRAS reports when 
a checklist is complete, and so on. It is thought that this 
might help reduce the clutter of instrumentation and fault 


Maining displays in the aircraft. In addition, it was even 


iS 





postulated that a speech recognition system together with an 
adequate display system could substitute for a second man in 
an F-14/A-6 type aircraft. It could save space, and reduce 
weight, fuel consumption, manpower, training and the life 
cycle costs of an aircraft [6]. 

Other military areas where voice recognition systems 
could be used might include command centers, combat informa- 
tion centers on board Navy ships, inputs for weapons fire 
control systems, and air traffic control. Very interesting 
and relevant research is presently being done at NPS on the 
possibility of using voice systems for the military photo 
interpreter and for use with the Joint Chiefs of Staff (JCS) 
Emergency Action Message (EAM) system. Appendix A lists voice 
recognition studies which have been, or are presently being 
Semaucted at NPS. 

Although a good deal of research has been done on the 
feasibility and design of interactive voice recognition sys- 
tems, much is yet to be done. For insStance, how do you improve 
the acoustic phonetic analysis ability of a system so that it 
is able with a high degree of accuracy to understand continuous 
voice commands from a large number of people? Is there really 
even a need for continuouS voice recognition systems? They 
would certainly be nice, and they are much more "natural" than 
isolated word systems for a human user, but what is the op- 
portunity cost of developing them? These questions are now 
being answered and will be answered in the future, thanks in 


great part to the impetus of the SUR project. 
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Ii. BACKGROUND 


The area of Command, Control and Communications, or C3, 


has been an integral part of human existence since the begin- 
ning of civilization, although it has gone by different titles 
and has had slightly different shades of meaning. There is a 
great deal of difficulty even now in defining and quantifying 
this "new" area of e It is definitely a process, it in- 
volves equipment and individuals, and also goals or missions. 
Heethis author C3 1S a process or means by which a military 
commander (or civilian authority) exercises authority and 
direction in allocating scarce resources (e.g., money, troops, 


ships, etc.) in order to achieve organizational goals in the 


most efficient manner possible. 


A. VOICE RECOGNITION IN c? 


In his action of directing or allocating resources, in 
performing the vital elements of oe the commander must inter- 
act with individuals and equipments. Several of the military 
examples of speech recognition study in Section I fall within 
this area of oF These examples included the TAO-NTDS inter- 
face, use of voice recognition in a command center or CIC and 
use of voice recognition by a pilot in the cockpit of an 
aircraft. Each of these certainly depicts a command and 


control situation where voice recognition systems might be of 


use. Additionally, the example [9] of the increased speed of 
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input and lower error rates provided by voice input while 
controlling a distributed computer network certainly points 
to the possible use of voice for ce purposes. 

Several features make speech recognition potentially very 
useful in the area of c?, It is felt that there will be a 
closer coupling of the commander with the system he depends on 
when uSing speech inputs. Most commanders would never tie 
themselves down to a keyboard during any crisis or battle 
Situation. With the use of speech recognition and a wireless 
microphone there would not be this feeling of being tied 
down. There would also be more centralization of control in 
a crisis situation. This would result in increased speed of 
interaction with the system, and a more effective use of the 
new support technology available [6]. 

Ina c? environment, voice systems could certainly be 
used for data input and retrieval. A Task Force commander 
would directly use such a system for information management 
and evaluation, as an aid in decision making, and for decision 
dissemination. The closer a commander can be to the system 
upon which he bases his decisions, the better the quality of 
his decisions should be, with greater avoidance of serious 
error. Command language also is of limited complexity with a 
rather large vocabulary to cover many possibilities, and this 


should suit it well to a voice recognition system. 
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B. WARGAMING/SIMULATIONS FOR MEASURING c? St Pee Gl VENEoS 


One of the major problems in the ex arena has been how to 
measure the effectiveness of a c? system. Since c? must func- 
tion in distinctly different conditions (e.g., peacetime, 
periods of crisis, conventional or nuclear war) this becomes 
increasingly more difficult. How does one gauge or measure 
whether a Command and Control system will function ina 
nuclear war? More importantly, perhaps, is whether the system 
will function in those transition times between each of these 
major conditions. 

It is certainly not sufficient to measure effectiveness 
by simply comparing the "output" of one system with that of 
another. For example, for a new communications system simply 
having a higher message handling rate or a lower bit error 
rate than an existing system does not necessarily improve the 
oo Capability. The effectiveness of a system in improving the 
chances of victory in battle, or for achieving organizational 
goals, makes it a better om system. Since it is often not 
possible to test a system under such conditions, simulations 
and models are often used. 

War games are a type of simulation frequently used by the 
military to evaluate oe effectiveness. Through the use of a 
war game evaluators and commanders can determine with a great 
deal of accuracy the effectiveness of present and proposed 

3 


C~ technologies under simulated warfare conditions. Such 


war games often allow for replication so that a scenario 


1s, 





basically can be replayed using different c? strategies in 
order to evaluate the effectiveness of one system as opposed 
to another. War games are a very cost-effective means of 
running such an evaluation under realistic conditions using 
experienced players. 

i Manual and Voice Inputs for Games 

The most realistic war games today, those which are 
able to be run at a near real-time speed, which are able to 
enter and disseminate a large volume of sensor and fire con- 
trol data, and which are able to regularly and quickly update 
displays are either computer-assisted or computer-run war 
games. Manual war games, although generally no less accurate 
than computer-assisted games, are usually very slow moving, 
require many extra participants to record data and often 
quickly become monotonous and tedious. In a computer-assisted 
war game commands are generally input at a keyboard as is 
usually the case for most other computer-type functions, as 
previously noted. It is certainly plausible to consider using 
voice input devices to run such war games. 

If war games are to be used to evaluate ce effective- 
ness, one facet of such an evaluation certainly could be any 
increase in effectiveness provided by a voice recognition 
system as opposed to conventional manual input. In fact a 
war game can be used as a vehicle for testing the concept of 
uSing voice recognition equipment in any number of other 


Military applications where high speed of input and low error 
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rates are necessary. It was with this thought in mind that 
the author decided to develop and conduct an experiment com- 
paring the use of voice and manual inputs to run a Naval war 
game. The author chose the CINCPACFLT version of the Warfare 
Environmental Simulator (WES) as the war game to use in this 
experiment. WES was chosen mainly because it is easily acces- 
Sible from the NPS Remote Site Module (RSM) and because the 


author was already somewhat familiar with its operation. 


C. DESCRIPTION OF THE WARFARE ENVIRONMENTAL SIMULATOR [12] 

The Warfare Environmental Simulator (WES) is a computer- 
assisted war game which runs on a DEC KL-2040 or a PDP-10 
computer at the Naval Ocean Systems Center (NOSC), San Diego, 
California. WES is a two-sided interactive game in which 
Blue and Orange sides can define, structure and control their 
own forces. The game is strictly a Naval war game which 
employs approximately 80 player commands to control the plat- 
forms and sensors engaged in the game. 

Each command position in a WES game contains a graphics 
terminal situation display, an alphanumeric terminal present- 
ing status board displays and another alphanumeric terminal 
for input of player commands. This player terminal acts as 
both an input and an output terminal. While the system is in 
the input mode output messages are queued. The color graphics 
display is driven by a PDP-11/70 which is interfaced with 
NOSC's KL~2040 or PDP-10 via the ARPANET. WES operates under 


either the TOPS-20 or TENEX systems. 
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The WES game is a combination of three major processes, 
called BUILD, FORCE and WARGAM. Each of these is an integral 
part of the war game and must be initialized and used prior 
to and during game play. The BUILD process is used to create 
and modify a database of game objects such as ships or shore 
bases. With BUILD a player may add, delete or modify a file 
of game objects in the database. This will normally be done 
prior to game play when determining the forces needed for the 
game. The database contains values for ship classes, shore 
bases, aircraft types, missiles, sensors and weapons. 

The FORCE process creates the actual game scenario to be 
used. With FORCE game objects from BUILD files are organized 
into task hierarchies for use in the game. FORCE specifies 
the actual names and classes of ships, their initial locations, 
courses and speeds along with any associated aircraft, sensors 
and weapons. FORCE allows a player to create new game scena- 
rios, to modify a scenario, to change numeric parameters or 
to input or delete items from a scenario. Contingency plans 
which might be used during a game can also be created and 
entered into the specific game database by uSing FORCE. 

WARGAM actually runs the interactive game based on the 
chosen scenario and the commands input by the players. Once 
initiated it responds to player commands, generates both the 
graphics and the status board displays and updates these 
displays each game minute. The WES graphics display at NPS 


uses a GENISCO display processor/CONRAC CRT to display in 
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@enor the graphic situation display. This display includes 
grid tick marks, background maps, NTDS symbology for friendly, 
neutral and enemy tracks, lines of bearing for passive sensors, 
weapons envelopes and game time. 

Six alphanumeric status board displays are controlled by 
WARGAM and are shown on a user terminal one at a time. The 
player controls the status board functions by depressing 
appropriate keys at the terminal. The six status board dis- 
plays include the following: active track status, passive 
track (ESM) status, friendly ship status, friendly air status, 
friendly shore bases status and flight status. These displays 
then contain all the status information which one would 
expect to find in the CIC on a surface ship. 

The WES commands which players use to control the war game 
are highly formatted in terms of syntax and input parameters. 
Two types of errors are possible when inputting a command. 
First, the syntax may be incorrect. In this case an immediate 
Warning is issued on the terminal saying that the command can- 
not be parsed. This should alert the player to check his 
command and then reenter it correctly. Second, a command 
Might order some impossible action (e.g., addressing a ship 
not in the game). No immediate warning is issued in this case 
Since the order parses correctly. However, when execution of 
the order is attempted it cannot be carried out and this fact 
is displayed on the terminal for the player. When an order 


is entered correctly, the system responds that the order has 
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been entered; this indicates that the order was parsed and 
sent on for execution, but not that there is no possible 
discrepancy in the order (as noted in the second error case 
above). 

It was with this game of WES as described above that the 
author conducted his voice/manual input experiment. The 
details and background of the experiment are described in 
Section III and its results in Section IV following. The 
conclusions drawn from the data collected address the feasibil- 
ity of uSing an automatic voice recognition system to run 


computer-assisted war games in general, and WES in particular. 
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ci exe eR IMENTAL DESIGN 


A. CONCEPT OF THE EXPERIMENT 

The basic goal of this experiment was to test the feasibil- 
ity of operating WES by uSing voice inputs rather than the 
customary manual inputs. This would be accomplished by having 
a number of test subjects individually enter valid WES com- 
mands for BLUE forces while the game was running, recording 
the time necessary to successfully enter the commands and the 
number of errors committed with voice and manual input, and 
then analyzing the data to see whether one entry method was 
Superior to the other. Although the WES game would be run- 
ning, the only commands entered would be for the BLUE forces 
and therefore there would be no interaction between BLUE and 
ORANGE, or actual "game play." BLUE-ORANGE interaction was 
not considered necessary for the goals of this experiment. 
However, it was considered important to have WES running dur- 
ing the experiment, rather than having the subjects merely 
type out the WES commands or speak them to a voice recognizer, 
so that the actual interaction with the WES input/output 
player terminal would be accomplished as in a two-sided war 
game. 

In order to run WES, as noted in Section II, game forces 
must be assigned and a scenario established. The author chose 


to use an existing WES scenario with its associated forces 
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for the experiment. The CUBA scenario was chosen due to its 
relative simplicity and yet entirely adequate forces for the 
experimental goals. In this scenario three United States war- 
ships, alrcraft carrier ENTERPRISE, guided missile destroyer 
BERKELEY and nuclear submarine STURGEON are opposed by three 
Soviet warships and one merchant ship in a setting similar to 
the 1962 Cuban missile crisis. The test subjects would com- 
mand the ships and forces of the BLUE task force by uSing a 
fixed series of commands provided to them. 

It was necessary to establish a basic vocabulary which 
the subjects would use to enter the player commands to WES. 
This vocabulary had to be complete enough to allow formula- 
tion of any of the WES commands [12] which might be necessary 
during play of a game. The vocabulary had to contain all the 
Scenario specific words (e.g., ENTERPRISE, BERKELEY) which 
might become necessary in order to command those BLUE forces 
in the CUBA WES scenario. Also, the vocabulary had to be 
compatible with both the voice and keyboard methods of entry. 
The vocabulary which was used is considered sufficient to run 
any basic WES game involving the forces in the scenario. The 
total vocabulary amounted to 162 words or short phrases 


(Appendix B). 


Be BOULPMENT USED 
1. Hardware Description [13] 
For the experiment a Threshold Model T600 discrete 


utterance voice recognition unit manufactured by Threshold 
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Technology, Inc. was used. The T600 is an electronic speech 
recognition device which automatically recognizes utterances 
of up to two seconds in duration. These utterances can be of 
several words in length as long as they do not exceed this 
time duration. Since it is a discrete, or isolated speech 
recognition unit there must be a short pause (at least .1l 
second) between utterances. The T600 allows up to 256 
separate voice utterances to be stored in memory. As noted 
above, 162 utterances were the total vocabulary for this 
experiment. 

The Model T600 terminal used in this experiment con- 
Sists of an analog speech preprocessor, microcomputer, 
CRT/keyboard unit, magnetic tape cartridge unit, remote voice 
input unit and noise-cancelling microphone. The speech pre- 
processor and microcomputer are contained in a main terminal 
processor unit. The speech preprocessor accepts spoken input 
from the remote voice input unit, extracts speech parameters 
and converts these to digital signals which are then processed 
by the microcomputer. The microcomputer compares these input 
Signals with stored reference patterns to determine which 
vocabulary words were spoken. The reference patterns for all 
the vocabulary are established during a training phase when 
the user trains the voice recognizer by repeating each of the 
vocabulary utterances ten times. If a close match is found 


between an input speech utterance and a reference vocabulary 
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pattern, the utterance is "recognized" by the T600. It then 
sends to the user's computer the appropriate output string 
of characters associated with the recognized input. 

The T600 has three types of memory which the user 
may modify: speech reference patterns, prompt character 
Strings and output character strings. As noted above the 
speech reference patterns are formed when the user trains the 
voice recognizer by repeating the vocabulary utterances a 
number of times. The prompt character strings are input by 
a user at the keyboard and are displayed on the CRT for each 
utterance to prompt the speaker when he is training that 
particular utterance. The output character string, also 
initially entered via the keyboard, is the actual output 
sent to the user's computer over a communications interface 
by the T600 when an utterance is recognized. The recognized 
utterances are sent exactly as if they had been typed in at 
the keyboard. When spoken each of the utterances is echoed 
on the CRT as a visual display for the operator. 

The speaker uses a noise-cancelling microphone plugged 
into the remote voice input unit while speaking to the T600. 
This microphone allows the T600 to be used in noisy areas. 
The placement of the microphone by the speaker is very impor- 
tant during both the training and recognition phases with 
the T600. Accurate recognition may decrease if the microphone 
is moved from one position to another in relation to the 


speaker's mouth. It should be placed in front of the lips 
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Bieemot touching them, and slightly to the side of the 
speaker's mouth. The microphone should just touch the lower 
lip when the lip is extended forward as far as possible. If 
the microphone slips from this poSition while speaking, it 
should be readjusted before continuing. 

Data in the T600 memory is stored in the main terminal 
processor unit. In conjunction with this the magnetic tape 
cartridge unit, a digital tape recorder, is used to store 
this memory data on a tape cartridge and then to recall it 
from the cartridge whenever desired. The tape, once recorded, 
can be used to quickly retrain the terminal with the user's 
Speech patterns and specific vocabulary. This is very useful 
when the terminal is used repeatedly by a number of different 
users. 

For this experiment two additional pieces of equipment 
were connected in parallel with the T600 described above. An 
ADM 31 Data Display Terminal with print much smaller than 
that of the T600 was used so that the longer commands input 
by the user would entirely fit on a single line rather than 
"wrapping around" as they would on the T600 CRT. It was felt 
that this would eliminate one possible source of confusion 
for the test subjects. Additionally, a Miniterm Model 1203 
was used in order to obtain a hard copy printout of all the 
voice and manual input commands. This was necessary to 


accurately count and differentiate between the types of input 
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errors. This will be discussed at greater length in Section 
IV. The entire equipment set-up as used in the experiment is 
shown in Figure l. 
2. Available Input Modes 

The speed of entry and number of errors associated 
with three different input modes were to be evaluated in the 
experiment. Each subject would type the BLUE player commands 
at the ADM 31 terminal, would enter the same commands using 
the unbuffered voice mode of the T600 and would enter the 
commands via the T600's buffered voice mode. The order of 
the input modes was varied from subject to subject in order 
to eliminate any bias which the ordering might have introduced. 

In the typing mode with WES there is no way of cor- 
recting any error once it is typed prior to sending it to the 
game for execution (1.e., no backspace or erase). This is 
quite important since a single error will invalidate an entire 
WES command. If an error is made it is best to immediately 
type a carriage return (entering the incorrect order), and 
then retype the order correctly and enter it into the system. 
By doing this time is saved which would otherwise be wasted 
while completely entering a command already containing an 
error, and the possibility of committing further errors in 
this same command is eliminated. 

The unbuffered voice input mode to WES is very 
Similar to this. The T600 will send the ASCII character 


stream associated with any recognized voice input to the 
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user's host computer without the user being able to correct 
any "incorrectly recognized" spoken input. No editing is 
possible in the unbuffered voice mode of operation, and there- 
fore, like typing, when an error is noted it is best to enter 
the command at that point and then reenter it correctly. In 
contrast to this the T600's buffered voice mode allows the 
user to verify his input stream, make corrections to it if 
necessary and then transmit it to the host computer. The 

T600 stores the utterances in an internal buffer which may 

be modified and the contents of this buffer are sent to the 


mest in a “block” when the user transmits them. 


Sor LeCLLION OF SUBJECTS 
1. Backgrounds 

Twelve subjects who participated on a voluntary basis 
were chosen for the experiment. Eleven of the subjects are 
Military officers (six Navy, four Air Force and one Army) in 
paygrades 03 - 05, and one is a civilian professor at NPS. 
Ten of the military officers are members of the Command, 
Control and Communications (c?) curriculum at NPS and the 
eleventh is on the faculty. Two of the twelve subjects 
are female Naval officers. 

All subjects had previously had at least a brief 
exposure to WES while at NPS. However, only one subject, 
the female faculty member, was considered to be experienced 


with WES. In addition, all the subjects had at least minimal 
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experience with voice recognition systems, with six of the 
subjects considered "experienced" with voice systems. This 
experience was established for four subjects by participating 
in a six month controlled voice recognition longitudinal 
study, and for the two faculty members by continuous use of 
voice systems over a prolonged period of time (more than 
three years for the civilian professor). This breakdown of 
Six experienced subjects and six inexperienced with voice 
recognition systems was planned in order to determine whether 
prior experience would be a significant factor in determining 
the preferred method of command input to WES. A synoptic 
background of the twelve test subjects is contained in 
Appendix C. 
fee initial Training 

Each of the subjects met individually with the author 
and was given a typing ability test. This consisted of a 
five minute typing exercise (similar to that given to a GS-2 
typist) during which the subject was instructed to type two 
given paragraphs totalling 21 lines as quickly and accurately 
aS possible without error correction. A subject's speed in 
words per minute (wpm) was then calculated with a scoring 
table approximately using the formula wpm = total characters/ 
25. A certain number of errors, increasing with the number 
of gross words per minute typed, was permitted, with any 
errors in excess of this number resulting in .2 wpm per error 


Subtracted from the final typing speed. 
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The typing ability test was given to determine whether 
there was a clear cut distinction between typists and non- 
typists among the test subjects. Although one subject typed 
below 20 wpm and two subjects were above 40 wpm, nine of the 
subjects were grouped between 21 and 39 wpm. Due to this 
close grouping and the rather short length of the WES com- 
mands this difference in typing speeds was not considered 
important. The typing test used, along with its scoring 
matrix, is shown in Appendix D. 

Fach of the subjects next trained the T600 voice 
recognition unit uSing the WES vocabulary of Appendix B. 

This traaining was accomplished by having the subjects repeat 
each vocabulary utterance ten times while in the T600 train- 
ing mode in order to optimize the stored reference patterns 
for their individual speech variations. The average time 
required to train the 162 utterance vocabulary was 94 minutes, 
with the shortest time being 69 minutes and the longest 116 
Minutes. 

Once the training was completed each utterance was 
repeated three additional times while in the T600's recogni- 
tion mode to check for recognition accuracy. If at least 
two of each three vocabulary utterances were correctly recog- 
nized, the utterance was considered to be properly trained. 
If not, that vocabulary word was then retrained and again 
checked for accuracy. On the average each subject retrained 


five utterances (three being the least number retrained and 
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nine the highest), with phonetically similar expressions such 
as HARM/ARM, with/list, back/track/attack and dive/five caus- 


fmaigeehne most difficulty. 


Pee CONDUCTING THE EXPERIMENT 

For the experiment the author had compiled a list of 20 
basic WES commands for the CUBA scenario. These 20 commands 
(Appendix E) totalled 272 voice utterances and used 67 of the 
162 vocabulary utterances considered necessary to run an 
actual WES war game. The author had further divided these 
20 commands into five shorter groups of four commands each 
(Appendix F). The commands in these five groups were arranged 
so that each group would be of approximately the same length. 
(Those utterances in Appendices E and F which consisted of 
more than one word are highlighted as they were for the sub- 
jects during the experiment.) 

Each subject would be required to input the list of 20 
commands and the five shorter lists of commands by the three 
methods of typing, unbuffered voice and buffered voice. The 
order of the input methods and the lists of commands used 
was randomly varied from subject to subject to eliminate any 
bias. When inputting the short lists, whether by typing or 
voice the subjects were given a brief rest between each of 
the five lists. The use of the 20 command list and the group 
of five lists with breaks between each was designed to see 


whether fatigue, frustration, or the prospect of having a 
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long or short task ahead might have any relevance on the 
results of the different entry methods. 

The conceptual design of the experiment is shown in 
@eeieeee. this 1s a three-factor nested design with repeated 


measures over the tasks. Each subject is nested within only 





one of the levels of experience. 

Once each subject had finished training the T600 he met 
at a later time with the author to conduct the actual 
experiment. At this point the subjects were given a brief 
overview of what they would be doing along with a verbal set 
of instructions (Appendix G). Since in some cases it had 
been several weeks since the initial voice training all the 
subjects were given a copy of the WES vocabulary in order to 
refresh thelr memories. In addition the subjects were pro- 
vided a list of practice commands (Appendix H) with which they 
were allowed to train until they felt at ease and confident 
with the use of the voice recognition system. 

After each subject felt satisfied with his practice the 
experiment was run. The entire list of 20 commands and the 
five groups of commands, depending on the order used, were 
entered into the WES game via the three different input 
methods. While using the voice recognition modes,if an 
utterance was misrecognized four consecutive times or an ab- 
normally large number of times throughout the experiment, the 


author stopped the clock and had the subject retrain that 
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utterance rather than continue to struggle against the 
system. This was done on six occasions. The results of the 


experiment are contained in Section IV. 
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ihe SReoENIATION OF DATA 


Pee DATA COLLECTION TECHNIQUES 

During the typing and the unbuffered voice modes of the 
experiment the Miniterm was used to keep a typescript of all 
commands entered by the subjects and the responses of the 
WES game. During the buffered voice mode the Miniterm was 
not used since the only commands which would have been printed 
were those already corrected by the subject and sent contain- 
ing no errors from the internal buffer. Instead the author 
manually recorded errors during this phase. 

The following measures of performance were recorded during 
all the trials: 1) the time required to complete a specific 
scenario, and 2) the number of input command errors. Input 
errors were divided into two types, recognition errors and 
operator errors. Recognition errors were those encountered 
when the T600 "thought" the subject said one thing but he 
had actually said another. This type of error was not 
applicable to the typing mode. An operator error was any 
other type of error committed which was not attributable to 
the T600 (e.g., a typing mistake, the operator forgetting 
to say "Space" after a number, the operator saying "for" 

(and having it recognized as "4") rather than "for the," 


Piece.) . 
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In analyzing the data the author was interested in the 
actual number of errors committed. Therefore every single 
error was counted as a Separate error. For example, if the 
subject made one typing error, or had one voice utterance 
misrecognized during a command, this was counted as one error. 
However, if the subject committed two typing errors in the 
same command before entering the command, this was counted as 


two errors although they only invalidated a single command. 


B. GENERAL RESULTS 

As noted earlier, each set of 20 voice commands contained 
272 voice utterances. Each subject was required to input the 
total 20 commands four different times by voice (i.e., the 
list of 20 commands by buffered and unbuffered voice, and 
the five groups of four commands in the same manner). There- 
fore, if no voice errors had been committed, the twelve sub- 
jects would have inputted a total of 13,056 voice utterances 
during the experiment. However, the occurrence of both 
recognition and operator errors, and having to reenter the 
commands which contained these errors, resulted in a some- 
what greater number of voice utterances for the experiment. 
(The author did not physically count this total number.) 
There were 982 recognition errors recorded during the 
experiment. 

After analyzing the typescript from the unbuffered voice 


portion of the experiment, it was found that of the 6/7 
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utterances used to form the 20 WES commands, 46 of these 
utterances had been miSrecognized at least once for some other 
vocabulary utterance. Twenty-one of the utterances were 

never misrecognizeda by the T600. In the buffered mode only 
the numbers of recognition errors were recorded rather than 
the misrecognized words since the author was not able to 

keep an accurate record of these. 

There were more total errors with each of the voice input 
modes than with the typing mode. The following data were 
found when looking at total number of errors (recognition 
Sm@Gors + Operator errors): typing, 169 total errors; buffered 
voice, 542; and unbuffered voice, 701. These figures show 
that the typing mode had 68.8 percent fewer total errors than 
did the buffered voice mode, and 75.9 percent fewer errors 
than the unbuffered voice mode. 

All of the subjects, regardless of typing ability, had 
been inputting data via a keyboard for at least five quarters 
while at NPS, while only six were considered experienced in 
voice entry. In addition, subjects seemed to try to be quite 
precise while typing at the keyboard where they had total 
control over any errors committed as opposed to voice input 
where the T600 might not recognize their utterance. 

As far as time was concerned, the total time required 
for all the subjects' typing inputs was 254.35 minutes, 
286.17 minutes for buffered voice and 585.7 minutes for un- 


buffered voice. Therefore typing was 11.1 percent faster 


41 





than buffered voice, and 56.6 percent faster than unbuffered 


voice input. 


See REOULTS FOR SCENARIO TIMES 

Table 1 shows the time in minutes required for each sub- 
ject to input the list of 20 commands by the three entry 
methods, and Table 2 shows this data for the five groups of 
commands. An analysis of variance [14] was performed on this 
time data and Table 3 gives the statistical results. (The 
task of inputting either the 20 commands or the five groups 
of commands is hereafter referred to as the Task Type.) 

Table 3 shows that there was a statistically significant 
difference (at the a = .10 level) in time for experience 
level, as can be seen in Figure 3. (An a level of .10, for 
example, means that there is only a 10 percent chance or 
less that it is wrong to say there was a Significant differ- 
ence in certain conditions.) The experienced subjects were 
able to input the commands faster via all three entry methods, 
and most noticeably by unbuffered voice where the average 
time climbed most steeply for the inexperienced subjects. 

Table 3 also shows that there was a significant difference 
(a = .01) in time for entry method. A range test [15] showed 
that there was a significant improvement in time with both 
typing and buffered voice over unbuffered voice, and that 
there was no difference between typing and buffered voice as 
far as time is concerned. These results are shown in Figure 


4. 
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Table 2. Time for Five Groups 
of 4 Commands Each 
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Table 3. Analysis of Variance for Time 


SOURCE ds MS F 
Between Subjects IK, 
EL (experience level) 1 7 OOzer2s8 2 3.62067* 
Error, LO 132.6050 
feeenin Subjects 60 
EM (entry method) 2 3925246 Ono LLOs * 
TT (task type) a WS) 5 (asi .8093 
EL x EM 2 432.8107 Sy ec 21be) 
me x «OTT iL woo 2 .0033 
ax TT 2 18.8148 Ipees a 2 
eax EM x TT Z 3.0497 az. 
Error, 20 in2G ener > 2 
Error. 10 18.5524 
Error, 20 14.34 
aao<. 10 
ae p~<.01 


df: degrees of freedom 
MS: Mean Square 
Fs F test ratio 
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46 





AVERAGE TIME/SUBJECT (minutes) 


Figure 4. 





Experienced Inexperienced 


EXPERIENCE LEVEL 


Average Time for Two Experience Levels 


47 





There was also significant experience level-by-entry method 
interaction shown in Table 3. This is shown in both Figures 
3 and 4 and was due mainly to the effect of the inexperienced 
subjects with unbuffered voice where the average time increased 
much more quickly than it did for the experienced subjects. 
Table 3 further shows that there was no difference in the 
two task types with respect to time. There was also no other 


Significant interaction shown. 


fee eoULTS FOR INPUT ERRORS 
ieee ReCOgnition Errors 
The total number of recognition errors for each sub- 
ject in the two voice entry modes for 20 commands is given 
in Table 4. Table 5 shows this data for the five groups of 
commands. The results of the analysis of variance for this 
data are given in Table 6. 

Table 6 shows that there was no significant difference in 
either experience level, entry method or task type with res- 
pect to recognition errors. Although it is not surprising 
that the entry method and the task type make no difference 
as far as recognition errors are concerned, it is somewhat 
Surprising that experience level does not. The author would 
have thought the opposite to be true, with experienced sub- 
jects having significantly fewer recognition errors. 

Table 6 does, however, show a Significant (a = .05) inter- 


action between entry method and task type as depicted in 
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Table 4. Recognition Errors for 
20 Commands 


UNBUFFERED BUFFERED 
SUBJECT Veer VOTE 
i 1 18 
Z 24 Oo 
3 5 2 
+ 28 8 
5 16 8 
6 2 3 
ii 69 3 
8 Sl 26 
9 9 S 
10 6 6 
a 5 16 
1h 29 20 
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Five Groups of 4 Commands Each 
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Table 6. Analysis of Variance for 


Recognition Errors 


SOURCE 


Between Subjects 
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df: degrees of freedom 
MS: Mean Square 
iP: F test ratio 


10 


10 


1452 


S465 9916 


Js fags BB) 
4.0833 

20770353 
48 

108 
80.0834 

440.5583 
Pee Ss Mie 


Ate 9 1G 


Sale 


| 23. 


depo ys ee 


Silke! 
.0510 
.9535 
.60 
.1695* 


Negao ss 





Figure 5. Although the average number of recognition errors 
was greater for 20 commands than for the five groups in un- 
buffered voice, the oppoSite was true for buffered voice. 
There is also a significant three-way interaction shown in 
Table 6 between experience level, entry method and task type. 
This interaction is shown in Figure 6. 
peeeeometator Errors 

Operator errors were all errors other than those 
caused by the T600 voice recognition unit. This included 
such things as typing and spelling errors in the typing mode, 
and basically forgetting the various ground rules, and there- 
fore causing mistakes, while using the voice modes. Table 7 
shows the number of operator errors committed while inputting 
the list of 20 commands, and Table 8 gives this information 
for the groups of commands. Table 9 shows the results of the 
ANOVA performed on this data. 

Table 9 shows a statistically significant difference 
at the a = .05 level in operator errors for entry method. A 
range test showed a significant decrease in operator errors 
for buffered voice as compared to both unbuffered voice and 
typing. The range test showed no difference between the 
typing and unbuffered voice modes with respect to operator 
errors. This is shown in Figure 7 where buffered voice has 
fewer operator errors than the other input methods for both 


experience levels. 


BZ 





AVERAGE NO. RECOGNITION ERRORS/SUBJECT 





Unbuffered Buffered 
Voice Voice 


PoE Ob 


Figure 5. Average Number of Recognition Errors 
for Different Entry Methods 
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Figure 6. Average Number of Recognition Errors 
for Different Experience Levels 
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Table 7. Operator Errors for 
20 Commands 





UNBUFFERED Burr BRED 
SUBJECT ela WICC VOICE 
iL is 10 8 
2 al 9 6 
3 4 8 4 
- 6 LS + 
S 4 4 3 
6 10 iL 3 
7 7 6 iL 
8 9 ai 9 
2 2. 4 2 
10 4 Zs 3 
seal 3 6 + 
1s S, 3 2 


25 





fable 8. Operator Errors for 
Five Groups of 4 Commands Each 


UNBUFFERED BUFFERED 
SUBJECT Aye sh VOICE VOICE 

i itae 7 6 

2 10 Le 7 

3 13 8 5 

4 6 5 6 

5 3 2 Ue 

6 16 10 3 

di 4 3 5) 

8 9 8 5) 

9 4 2) ~ 
10 9 3 Z 
11 i ip 3) 
2 6 1 5 
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Mable no = 


SOURCE 


Between Subjects 


EL (experience level) 


Error, 


Within Subjects 


EM (entry method) 
TT (task type) 

mo xX EM 

Pa xX TT 

eo x «TT 

po x EM x TT 
Error, 

Error, 


Error. 


Bos °05 


df: degrees of freedom 
MS: Mean Square 


ES 


F test ratio 


Analysis of Variance for 


Operator Errors 


10 


20 


46 


24 


Dice 


14. 


Tore 


3) 7] 


niece 


- 8888 


OO eZ 


2222 


yo472 
ya 
POUT. 
~8472 


wide 


7888 


Be eie te 


|= 


52) 1 A 


soos 
~0314 
on 
oO eonk 
~ 3147 


~8942 
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OPERATOR ERRORS/SUBJECT 


Buffereg Voice 


AVERAGE NO. 





Experienced Inexperienced 


EXPERIENCE =EEVEL 


Figure 7. Average Number of Operator Errors 
for Different Experience Levels 
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Table 9 shows that there is no significant difference 
in operator errors over either experience level or task type. 
There are also no Significant interactions shown in the 
table. 

ee Lotal Errors 

The total errors are the sum of the recognition and 
Operator errors. The total number of errors for each subject 
1S given in Table 10 for the task of entering 20 commands, and 
in Table 11 for the groups of commands. As for the other 
types of errors an analysis of variance was performed on this 
data, with the results presented in Table 12. 

The results of the ANOVA show a significant differ- 
ence in total errors for entry method. A range test showed 
a Significant decrease in total errors for the typing mode 
when compared with both unbuffered and buffered voice. There 
was no Significant difference between the two different voice 
input modes. This result iS shown in Figure 8. IT MUST BE 
REMEMBERED, HOWEVER, THAT THE TYPING MODE DID NOT INCLUDE 
RECOGNITION ERRORS, WHEREAS THE TWO VOICE MODES DID. THERE- 
FORE, FOR THE VOICE MODES TOTAL ERRORS ARE THE SUM OF OPERATOR 
AND RECOGNITION ERRORS, WHILE FOR TYPING TOTAL ERRORS ARE THE 
SAME AS OPERATOR ERRORS. THIS CAN BE SEEN BY COMPARING THE 
CURVES FOR TYPING IN FIGURES 7 AND 8 WHICH SHOW TYPING WITH 
THE EXACT SAME TREND BECAUSE THERE COULD BE NO VOICE RECOGNI- 


TION ERRORS UNDER THE TYPING METHOD. 


ao 





SUBJECT 


Jak 


v2 
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UNBUFFERED 
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Hotere rrors for 
20 Commands 


Zo 
53 
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43 
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Big 
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Tale w ier oOtal Brrors for 
Five Groups of 4 Commands Each 





UNBUFFERED BUFFERED 
SUBJECT TYPE VOELCE VOICE 
u Il a2 20 
2 10 ay 50 
3 IL ar 2 
4 6 18 2 
2) 5 ie 16 
6 16 16 6 
7 4 Ss, 38 
8 9 78 P58 
2 4 Ze IL, 
me 9 8 9 
wal a Sih 48 
1u7 6 Ee: 29 


6a 





Table 12. 
es 


SOURCE 


Between Subjects 
EL (experience level) 


Error) 


Within Subjects 
EM (entry method) 
TT (task type) 
EL xX EM 
oo x TT 
ae x TT 
moex EM x TT 
Error, 
ee rOr 


Z 


Error, 


eos .0l 


df: degrees of freedom 
MS: Mean Square 
i 3 F test ratio 


Analysis of Variance 


Total Errors 


10 


20 


DiS Jcos 2 


(G0 .57 22 


SOW GOS 
Ae 
442.1805 
26.8888 
7 5 416 
662.2650 
408.5638 
Soe 277 


40.2861 
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Figure 8. Average Number of Total Errors 
for Different Experience Levels 
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Table 12 shows that there is no significant difference 
in total errors over either experience level or task type. 
In addition, there are no Significant interactions in the 


area of total errors. 
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V. CONCLUSIONS AND RECOMMENDATIONS 


A. EXPERIMENTAL CONCLUSIONS 

Based on the results of this experiment, twelve test 
subjects were able to input WES commands to the war game 
faster and with fewer total errors using the manual typing 
input mode than with two voice input modes. Experienced 
voice subjects input the commands faster than the inexperienced 
subjects, but experience level made no difference as far as 
the total number of errors committed was concerned. Typing 
was Significantly better as far as total errors, but there 
was no statistical difference between typing and buffered 
voice modes as far as time was concerned. Finally, for time 
and total errors, it made no difference which of the two task 
types was being performed. 

The results suggest that manual input is certainly supe- 
rior to unbuffered voice, and in some respects to buffered 
voice input in this experiment. However, the author feels 
that this must be qualified by looking at the unique situa- 
tion in which the input methods were being used. WES com- 
mands are very formatted and must be entered with no errors. 
This requirement caused many commands to be rejected and 
resulted in the definite infeasibility of using unbuffered 
voice input with WES. It simply took too long and resulted 


in too many errors. The buffered voice mode held its own 
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with typing when considering input time, and it was actually 
better than typing for operator errors. 

In a task such as running WES, when the goal must be 
perfection in entering all player commands in order to 
actually play the game, if the time required for two differ- 
ent input methods is the same, then it appears that their 
error rates are inSignificant. In this experiment there was 
no statistical difference in time for the typing and buffered 
voice input methods, so the fact that buffered voice had more 
total errors really makes no difference. The lists of com- 
mands were input and accepted by WES in the same amount of 
time regardless of errors. 

There are also possible intangible benefits associated 
with the use of voice input to a computer, whatever its pur- 
pose might be. One such benefit might be the ability of 
supervisors or commanders to hear what is being told to or 
asked of a computer while they are still engaged in other 
activities. This would eliminate several people leaning 
over the shoulder of the operator trying to see what he is 
typing into the computer, allowing the operator to perform 
his job more easily and probably increasing the total effi- 


Clency in the work area. 


B. RECOMMENDATIONS FOR FURTHER STUDY 
Voice recognition, although very promising in many fields, 


certainly is not the panacea in all areas of input to 


66 





computers. This can be seen by the unbuffered voice results 
with WES. This should not, however, slow down the research 
being done in the field of voice recognition. Studies cited 
earlier point out very promising uses of voice recognition. 
The author believes that further research should be done, 
using the buffered voice mode, during WES games to test its 
validity in actual use. This could be done quite easily as 
thesis research work at NPS, in the om laboratory course at 
NPS, or in conjunction with scheduled war games involving 
NPS, CINCPACFLT and NOSC. 

In this experiment the subjects were divided into expe- 
rienced and inexperienced groups as far as voice recognition 
systems were concerned. However, the fact that the subjects 
were not experienced with WES was never taken into account. 
Another possible experimental factor might be to compare the 
results of experienced and inexperienced WES users. Although 
increasing the variables like this would make it more diffi- 
cult to find the required number of subjects, this could be 
done at NPS in the ee curriculum where the students take 
almost all of the same classes for six quarters. 

Further research also should be done in the NPS RSM, 
perhaps in conjunction with the WES games proposed above, to 
study the effects of background and ambient noise on the 
reliability of the voice recognition equipment. There will 
surely be this noise problem in any operational use of voice 


equipment in a command center, CIC or aircraft, and this 


67 





could easily be simulated by introducing noise while using 
the equipment at NPS. 

Research into the possible uses of voice recognition 
equipment in aircraft, intelligence, war gaming and other 
operational uses iS presently ongoing at NPS. These efforts 
will result in much new information on the uses and drawbacks 
of automatic voice recognition. Truly operational, rather 
than merely scholarly and scientific study in this field 
must be continued if we are to reap any benefits from this 
new technology available today. This should be an ongoing 
endeavor at NPS, and in the c? curriculum particularly where 
there 1S such promise and demand for this type of technology 


today and in the future. 
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APPENDIX A 
VOTGE, STUBDTES ArT NPS 


This thesis is one of several voice recognition research 
projects conducted for Professor G. K. Poock at NPS over the 
last several years. The complete list, in addition to this 


thesis, includes: 


Pemetrong, J. W., The Effects Of Concurrent Motor Tasking On 


Performance Of A Voice Recognition System, Masters Thesis, 
Naval Postgraduate School, Monterey, 1980. 


Batchellor, M. P., Investigation Of Parameters Affecting Voice 


Beeoonition Systems In C’ Systems, Masters Thesis, Naval 
Postgraduate School, Monterey, 1981. 


Peagaw, P. H., Investigation Of Voice Input For Constructing 


Joint Chiefs Of Staff Emergency Action Messages, Masters 
Thesis, Naval Postgraduate School, Monterey, 1981. 


Jay, G. T., An Experiment In Voice Data Entry for Imager 
@eeomligence Reporting, Masters Thesis, Naval Postgraduate 
Seemool, Monterey, 1981. 


Naval Postgraduate School Report NPS54-80-010, The Effects 
Of Certain Background Noises On The Performance Of A Voice 


Peeognition System, by R. Elster, September 1980. 


Naval Postgraduate School Report NPS55-80-016, Experiments 


With Voice Input For Command And Control: Using Voice Input 


To Operate A Distributed Computer Network, by G. K. Poock, 
April 1980. 


Naval Postgraduate School Report NPS55-81-003, Examination Of 


metee Recognition System To Function In A Bilingual Mode, by 


D. E. Neil and T. Andreason, February 1981. 


Taggart, J. L. and Wolfe, C. D., Speech Recognition As An 
Input Medium For Preflight In The P3C Aircraft, Masters 


Thesis, Naval Postgraduate School, Monterey, 1981. 
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APPENDIX B 


WES VOCABULARY 


A. BASIC WES WORDS 


ee two 
three Our 
five Six 
seven eight 
nine Zero 

a S 

e air 

all altitude 
at attack 
Back barrier 
bearing bingo 
blue by 
cancel carriage return 
course cover 
degrees delay 
designate distance 
dive drop 
east end 
enemy envelope 
execute exsup 
find distance from fire 
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fre a 

forces 

go 

heading 

if attacked 
kill word 
launch 

lay a minefield from 
Maneuver delay 
minefield 

name 

morth 

oni 

on 

orange 

other 

pass control of 
place a circle 
player 

Point 

pounds from 
probability of detection 
refuel 

self 

sensor delay 


space 
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for 
friendly 
guide 
help 
eine 


label 


lay a barrier from 


Lust 

map 

minutes 
neutral 

now 

Off 
ORBeoOntcact 
orders 

own 

place 

place a marker 
pict 
position 
probability 
proceed 
report 

send it 
south 


speed 





station 
surface 
time 
track 
using 


what is the 


BemeoCENARTO SPECIFIC 


A181 

A183 

A6E2 

ARM 

ASROC 
BERKELEY 
BPS14 
CBU24 

EA3 
ENTERPRISE 
F14B 

for ENTERPRISE 
G554 
Harpoon 
Maverick 
MK48 


MK83 


WES WORDS 
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submarine 
target 

to 
unknown 
west 


with 


A182 
A6EL 
ALR59 
ASMD 
AWG9 
Bluel 
BQQ3 
E2C 
EA6B 


ESM 


fOr BERKELEY 


for STURGEON 


HARM 


KA6D 


MK46 


MK8 2 


MK8 4 





Phoenix 

RF18B 

Sidewinder 

5LQ32 

Sonobuoy Passive 
SPN43 

SPS40 

SPS49 

STURGEON 
Tomahawk 


Walleye2 


ve 


Redeye 

sea Sparrow 
SbOly 

Sonobuoy Active 
Sparrow 

SPS10 

SPS48 

SOs25 

ciate cl tore 

Walleye 


WLR6 





sl 


2 


igae 


a2 


= Ss & S& S& SS 


APPENDIX C 


student 
student 
student 
student 
hacuisey 
faculty 
student 
student 
student 
student 
student 


student 


TEST SUBJECTS’ BACKGROUNDS 


experienced 
experienced 
experienced 
experienced 
extensive 
extensive 
minimal 
minimal 
minimal 
minimal 
minimal 


minimal 
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(wpm) 


32 
59 
46 
17 
34 
39 
2a 
39 
38 
Ba 
a7 


26 





APPENDIX D 


Rye tNG asi oeiys test 


Because they have often learned to know types of archi- 
tecture by decoration, casual observers sometimes fail to 
realize that the significant part of a structure is not the 
ornamentation but the body itself. Architecture, because of 
its close contact with human lives, is peculiarly and in- 
timately governed by climate. For instance, a home built for 
comfort in the cold and snow of the northern areas of this 
country would be unbearably warm in a country with weather 
such as that of Cuba. A Cuban house, with its open court, 
would prove impossible to heat in a northern winter. 

Since the purpose of architecture is the construction of 
shelters in which human beings may carry on their numerous 
activities, the designer must consider not only climatic con- 
ditions, but also the function of a building. Thus, although 
the climate of a certain locality requires that an auditorium 
and a hospital have several features in common, the purposes 
for which they will be used demand some difference in struc~- 
ture. For centuries builders have first complied with these 
two requirements and later added whatever ornamentation they 
wished. Logically, we should see as more additions, not as 


basic parts, the details by which we identify architecture. 


12 





wpm (errors allowed) 
Ist Pay (Bk 


typing of the exercise 


Line Number (5 minutes maximum) 
iL Ze SA) 
2 ECan) 94 (7) 
3 ae) SOnco)) 
4 9( ) 5958) 
5 EZ) 69) 
6 14( ) 64 (9) 
7 Gs) 66 (10) 
8 LSC) 68 (10) 
9 Zee) POL aka 

10 Zant ) 731) 
dla Z Os) TOILE), 
eZ Zo) qo 2) 
13 SN, 80 (12) 
14 Sr") == 
Ls SS) (en) oe 
16 38( ) aan 
Ly 40 (3) = 
18 42 (4) a 
Lg 44(5) ares 
20 47 (6) aS 
Zh 49 (6) Sra 
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APPENDIX E 


LIST OF 20 WES COMMANDS 


1. For Enterprise launch 2 F14B course 090 altitude 
15000 bingo 999 name 1F14B. 
ummoceberkeley attack enemy surface on contact using G554. 
3. Find distance from Enterprise to 42N 57W. 
fee Or oturgeon course 090 speed 15. 
See lace a circle Enterprise 150 time 15 999. 
MeO ScULrgeon report all surface using BQQ3. 
7. For Berkeley fire a harpoon target enemy surface 
sensor delay 2 heading 120. 
fee ass Control of I1F14B to Bluel. 
Seeeeeon IFl4B lay a minefield from 26N 42W bearing 
135 distance 10 using MK82. 
10. Place a marker 57N 71W time 23 300. 
ll. For 1F14B proceed course 215 distance 115. 
12. For Berkeley station bearing 000 distance 3 
guide Enterprise. 
Pron scurgeon attack enemy submarine on contact 
using MK48. 
14. For 1F14B altitude 20000 speed 600 course 090. 
iieelor Enterprise report all air using SPS49 


cimenwog, 999, 


van 





16. 


17S 


iS 


HED) 


ZO). 


For 1F14B attack enemy air on contact using Phoenix. 
Hemeimecrprise Launch LP KAGD course 000 
altitude 10000 bingo 120 name IKA6D. 

Plot all surface Enterprise 100. 
Beooebeekelev report enemy Forces using SLO32 
time 00 120. 


For 1F14B refuel 6000 pounds from IKA6D. 
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APPENDIX F 


Pay ooo wom Or FOUR WES “COMMANDS 


Bor berkeley attack enemy surface on contact using G554. 


Find distance from Enterprise to 42N 57W. 


For Berkeley fire a harpoon target enemy surface 


sensor delay 2 heading 120. 


Piotr allt surface Enterprise 100. 


BOG acu geon course 090 speed 15. 

Place a circle Enterprise 150 time 15 999. 
Beceenccrprise launch 2 FI4B course 090 
altitude 15000 bingo 999 name 1F14B. 

For Berkeley report enemy forces using SLQ32 


time 00 120. 


Basse CcOnerol of ILF14B to Bluel. 

Place a marker 57N 71W time 23 300. 

For Berkeley station bearing 000 distance 3 
guide Enterprise. 

Por Enterprise launch 1 KA6D course 000 


altitude 10000 bingo 120 name 1KA6D. 


Ue 





Pereoturgeon report all surface using BQQ3. 
For 1F14B proceed course 215 distance 115. 

For Sturgeon attack enemy submarine on contact 
using MK48. 


For 1F14B attack enemy air on contact using Phoenix. 


For 1F14B lay a minefield from 26N 42W 
bearing 135 distance 10 using MK82. 


HOmsencerprise report all air using SPS49 
time 00 999. 
For 1KA6D altitude 20000 speed 600 course 090. 


For 1F14B refuel 6000 pounds from IKA6D. 
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APPENDIX G 


INSTRUCTIONS FOR SUBJECTS 


You will be inputting a set of 20 commands and five sets 
of four commands each to the WES game by typing, unbuf- 
fered and buffered voice. 

If you make a mistake in either typing or unbuffered modes, 
carriage return right away to save time since it can't be 
corrected. Then reenter the command correctly. 

In the buffered mode you can use kill word or kill line 
to make changes before entering your commands. 

Input the commands as quickly as possible since you are 
being timed, but they must also be 100 percent accurate 
and accepted by WES. 

Remember to input a "Space" after numbers you enter. All 
words automatically have a space with them. 

Remember that the words "for" and "to" were trained as 
"for the" and "to the" to differentiate them from the 
numbers 4 and 2. If you forget "the," the utterance will 
be recognized as the number. 

All phrases which were trained as a single utterance 
(e.g., pass control of) are highlighted in yellow so you 
won't have to try to remember the phrases. Remember to 


speak them as a single utterance. 
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Ensure the microphone is correctly positioned and if it 
moves stop and reposition it. 

The green READY light must be on for the T600 to accept 
your utterance. Allow a short pause between each utter- 
ance for it to come back on. 

Use of a forceful tone of voice produces the best results, 
and try not to draw out the utterance by a breathing noise 


at the end. 
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APPENDIX H 
Pee leche VOLCE COMMANDS 


Wemesee lay a barrier Erom 36N 76W bearing 180 
distance 100 using sonobuoy passive. 

HOE IF14B bingo. 

For EA3 proceed position 27N 183E. 

For 1F14B speed 1200 course 090 altitude 10000. 


Designate Enterprise 77.1. 
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